Skip to content

Replace Piper/macOS TTS with Pocket TTS#646

Draft
dharmab wants to merge 17 commits intov2-devfrom
pocket-tts
Draft

Replace Piper/macOS TTS with Pocket TTS#646
dharmab wants to merge 17 commits intov2-devfrom
pocket-tts

Conversation

@dharmab
Copy link
Owner

@dharmab dharmab commented Mar 18, 2026

Replace Piper TTS (Windows/Linux) and macOS Speech Synthesis with Pocket TTS via sherpa-onnx.

Closes #635

🤖 Generated with Claude Code

dharmab and others added 2 commits March 18, 2026 00:43
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Pocket TTS speaker using sherpa-onnx voice cloning
- Remove Piper and macOS Speech Synthesis backends
- Verify archive hashes before extracting model downloads
- Add WAV decoder bounds checking for malformed files
- Add Application.Close() to release TTS C resources on shutdown
- Deduplicate model setup logic in CLI entrypoint
- Use named constants for model filenames instead of positional indices
- Handle unexpected model verification errors (e.g. permission denied)
- Use "Magic" callsign in integration tests for better TTS recognition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dharmab dharmab changed the title Replace Piper/macOS TTS with Pocket TTS via sherpa-onnx Replace Piper/macOS TTS with Pocket TTS Mar 18, 2026
dharmab and others added 15 commits March 18, 2026 02:13
Add a new integration-test job that downloads both Parakeet and Pocket
TTS models and runs the TTS→STT round-trip tests. Gate release and
push-images jobs on integration tests passing.

Fix model download paths in all build jobs from models/parakeet to
models so download-models correctly creates both parakeet/ and pocket/
subdirectories, including Pocket TTS models in release archives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Suppress gosec false positives:
- G101 on model file SHA256 hashes (not credentials)
- G115 on uint16→int16 PCM sample reinterpretation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests advisory

Add digit homophone substitutions (won→1, to/too→2, free/tree→3, for/fore→4,
ate→8, niner→9, tutu→22) and ordinal suffix stripping (5th→5) to
ParsePilotCallsign. Deduplicate consecutive repeated words to handle STT
stutter (e.g. "eagle eagle 2 7" → "eagle 2 7"). Truncate callsign text at
"request" instead of just removing the word.

Add comprehensive unit tests for homophones, ordinals, and stutter
deduplication. Add integration round trip test covering all 81 two-digit
callsign combinations (1-1 through 9-9).

Make CI integration tests advisory (continue-on-error) and run them only
after lint and unit tests pass, since TTS→STT round trips are inherently
nondeterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split model version metadata (URLs, hashes, filenames) into dedicated
version.go files so the CI cache key only changes when the actual model
version changes, not when download logic is modified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n integration tests

Add new bogey dope misrecognitions (bodidoda, bougie, vogie, wajidoke)
to the replacements LUT. Extract callsign similarity threshold to an
exported constant and update integration tests to snap parsed callsigns
to the closest candidate using edit distance, mirroring the real
application's radar database behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "dody dot" and "ody do" to the bogey dope replacements LUT.
Rewrite TestRoundTripCallsignNumbers to use a probabilistic approach
with multiple callsign words (Eagle, Mobius, Wardog), request phrasings,
and a multi-flight candidate list, requiring >99% success rate instead
of 100% to account for inherent TTS→STT lossyness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set GitHub Actions job timeout to 60 minutes and Go test timeout to
45 minutes to allow the probabilistic callsign round-trip tests to
complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use a worker pool (NumCPU/2 goroutines) with independent TTS/STT
pipelines to run integration test permutations concurrently, reducing
wall-clock time from ~10m to ~2m.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add replacements for bodhi, bojy, boy do, boyido, budgie, moji,
ogie, og da, vaughi, vogee, voji with parser unit tests.

Rework integration test to sample 40 random callsigns (20 common,
20 random) and repeat to 500+ inputs for statistical sensitivity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant